Skip to content

[arrow-flight encode path]re-use flatbufferbuilder#10220

Open
Rich-T-kid wants to merge 2 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/re-use-allocations
Open

[arrow-flight encode path]re-use flatbufferbuilder#10220
Rich-T-kid wants to merge 2 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/re-use-allocations

Conversation

@Rich-T-kid

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

flat buffer builder was being allocated repeatedly when it could be created once and reset using fbb.reset

What changes are included in this PR?

provides methods for setting and getting fbb. This avoids needing to re-allocate it on every call

Are these changes tested?

Are there any user-facing changes?

@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

cc @alamb

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 26, 2026
@alamb

alamb commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight

Comment thread arrow-ipc/src/compression.rs Outdated

impl CompressionContext {
/// Takes the stored fbb, leaving a zero-capacity placeholder. Must be returned via [`Self::return_fbb`].
pub(crate) fn take_fbb(&mut self) -> FlatBufferBuilder<'static> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to provide a take/return? Would it be possible to just reutrn a mut reference?

LIke

    pub(crate) fn fbb_mut(&mut self) -> & mut FlatBufferBuilder<'static> {

I think that would simplify the code a bit (and wouldn't require creating an empyt builder either)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me.

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4817028889-721-8v8lx 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-allocations (e9f85aa) to 44f3772 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                    rich-T-kid_re-use-allocations
-----                                     ----                                    -----------------------------
decode/fixed/65536x1                      1.00     48.4±0.09µs    40.4 GB/sec     1.08     52.1±1.25µs    37.5 GB/sec
decode/fixed/65536x4                      1.07   294.4±41.72µs    26.5 GB/sec     1.00   275.7±29.39µs    28.3 GB/sec
decode/fixed/65536x8                      7.99      4.5±0.49ms     3.5 GB/sec     1.00   563.9±12.87µs    27.7 GB/sec
decode/fixed/8192x1                       1.01      8.5±0.04µs    28.9 GB/sec     1.00      8.4±0.04µs    29.1 GB/sec
decode/fixed/8192x4                       1.00     28.8±0.27µs    33.9 GB/sec     1.12     32.2±0.22µs    30.4 GB/sec
decode/fixed/8192x8                       1.06     66.3±1.03µs    29.5 GB/sec     1.00     62.5±0.84µs    31.3 GB/sec
decode/nested/65536x1                     1.00  672.1±163.27µs     7.3 GB/sec     1.03  690.5±158.83µs     7.1 GB/sec
decode/nested/65536x4                     1.09      3.1±0.73ms     6.2 GB/sec     1.00      2.9±0.66ms     6.8 GB/sec
decode/nested/65536x8                     2.38     15.3±2.01ms     2.6 GB/sec     1.00      6.4±1.36ms     6.1 GB/sec
decode/nested/8192x1                      1.00    83.2±20.56µs     7.3 GB/sec     1.02    84.8±20.75µs     7.2 GB/sec
decode/nested/8192x4                      1.00   353.7±85.22µs     6.9 GB/sec     1.03   364.2±87.19µs     6.7 GB/sec
decode/nested/8192x8                      1.00  722.4±179.89µs     6.8 GB/sec     1.02  735.1±160.59µs     6.7 GB/sec
decode/variable/65536x1                   1.00  1189.8±159.74µs     7.4 GB/sec    1.10  1304.2±199.88µs     6.7 GB/sec
decode/variable/65536x4                   1.02      5.3±0.63ms     6.6 GB/sec     1.00      5.2±0.72ms     6.7 GB/sec
decode/variable/65536x8                   1.05     11.6±1.49ms     6.1 GB/sec     1.00     11.1±1.45ms     6.3 GB/sec
decode/variable/8192x1                    1.00   134.4±19.88µs     8.2 GB/sec     1.02   137.6±23.06µs     8.0 GB/sec
decode/variable/8192x4                    1.02   596.7±95.20µs     7.4 GB/sec     1.00   582.7±91.62µs     7.5 GB/sec
decode/variable/8192x8                    1.00  1262.1±197.66µs     7.0 GB/sec    1.01  1280.8±215.43µs     6.9 GB/sec
decode_stream/dict/65536x1x4              1.04   176.4±29.62µs     5.6 GB/sec     1.00   169.3±14.96µs     5.8 GB/sec
decode_stream/dict/65536x4x4              1.00  775.5±130.31µs     5.1 GB/sec     1.02  787.7±144.94µs     5.0 GB/sec
decode_stream/dict/65536x8x4              1.00  1584.9±189.64µs     5.0 GB/sec    1.01  1604.8±226.44µs     4.9 GB/sec
decode_stream/dict/8192x1x4               1.00     26.7±0.22µs     4.8 GB/sec     1.00     26.6±0.18µs     4.8 GB/sec
decode_stream/dict/8192x4x4               1.00    101.8±0.36µs     5.0 GB/sec     1.07    109.3±8.12µs     4.7 GB/sec
decode_stream/dict/8192x8x4               1.00    206.3±1.94µs     4.9 GB/sec     1.10   227.8±17.87µs     4.5 GB/sec
decode_stream/fixed/65536x1x4             1.00     49.3±0.24µs    39.6 GB/sec     1.08     53.3±0.97µs    36.7 GB/sec
decode_stream/fixed/65536x4x4             1.05   270.6±28.36µs    28.9 GB/sec     1.00    258.6±1.73µs    30.2 GB/sec
decode_stream/fixed/65536x8x4             1.00  585.4±104.88µs    26.7 GB/sec     1.12  656.3±130.13µs    23.8 GB/sec
decode_stream/fixed/8192x1x4              1.00      8.5±0.03µs    28.9 GB/sec     1.00      8.5±0.03µs    28.9 GB/sec
decode_stream/fixed/8192x4x4              1.02     29.6±0.24µs    33.0 GB/sec     1.00     29.0±0.16µs    33.7 GB/sec
decode_stream/fixed/8192x8x4              1.00     64.2±1.08µs    30.5 GB/sec     1.03     66.1±0.54µs    29.6 GB/sec
decode_stream/nested/65536x1x4            1.00  678.1±165.49µs     7.2 GB/sec     1.01  682.2±165.76µs     7.2 GB/sec
decode_stream/nested/65536x4x4            1.00      3.0±0.70ms     6.6 GB/sec     1.04      3.1±0.70ms     6.3 GB/sec
decode_stream/nested/65536x8x4            1.00      6.0±1.33ms     6.5 GB/sec     1.07      6.5±1.32ms     6.1 GB/sec
decode_stream/nested/8192x1x4             1.00    83.4±20.67µs     7.3 GB/sec     1.02    84.8±20.80µs     7.2 GB/sec
decode_stream/nested/8192x4x4             1.00   348.0±85.61µs     7.0 GB/sec     1.01   353.1±84.20µs     6.9 GB/sec
decode_stream/nested/8192x8x4             1.00  711.1±166.77µs     6.9 GB/sec     1.02  722.1±168.21µs     6.8 GB/sec
decode_stream/variable/65536x1x4          1.00  1168.6±178.60µs     7.5 GB/sec    1.03  1198.3±170.14µs     7.3 GB/sec
decode_stream/variable/65536x4x4          1.00      5.1±0.75ms     6.9 GB/sec     1.01      5.2±0.75ms     6.8 GB/sec
decode_stream/variable/65536x8x4          1.05     11.9±1.80ms     5.9 GB/sec     1.00     11.3±1.46ms     6.2 GB/sec
decode_stream/variable/8192x1x4           1.00   134.3±20.74µs     8.2 GB/sec     1.03   138.0±20.65µs     8.0 GB/sec
decode_stream/variable/8192x4x4           1.00   580.6±97.42µs     7.6 GB/sec     1.04   606.7±91.19µs     7.2 GB/sec
decode_stream/variable/8192x8x4           1.00  1225.5±178.45µs     7.2 GB/sec    1.04  1274.7±221.45µs     6.9 GB/sec
do_put_dictionary/dict/hydrate/65536x1    1.01   388.1±13.78µs   647.9 MB/sec     1.00   382.4±14.35µs   657.4 MB/sec
do_put_dictionary/dict/hydrate/65536x4    1.00  1333.6±28.89µs   754.1 MB/sec     1.17  1553.8±155.56µs   647.2 MB/sec
do_put_dictionary/dict/hydrate/65536x8    1.00      3.2±0.36ms   638.4 MB/sec     1.22      3.9±0.37ms   521.4 MB/sec
do_put_dictionary/dict/hydrate/8192x1     1.00     90.3±0.86µs   361.7 MB/sec     1.00     90.1±1.41µs   362.5 MB/sec
do_put_dictionary/dict/hydrate/8192x4     1.00    204.1±2.32µs   640.2 MB/sec     1.03    211.2±5.17µs   618.7 MB/sec
do_put_dictionary/dict/hydrate/8192x8     1.05   387.4±16.59µs   674.4 MB/sec     1.00    369.4±6.27µs   707.3 MB/sec
do_put_dictionary/dict/resend/65536x1     1.01    108.0±1.31µs     2.3 GB/sec     1.00    106.6±1.76µs     2.3 GB/sec
do_put_dictionary/dict/resend/65536x4     1.00    289.7±3.10µs     3.4 GB/sec     1.05   303.3±11.19µs     3.2 GB/sec
do_put_dictionary/dict/resend/65536x8     1.00    505.6±4.86µs     3.9 GB/sec     1.01   512.2±15.40µs     3.8 GB/sec
do_put_dictionary/dict/resend/8192x1      1.00     60.1±1.01µs   543.4 MB/sec     1.03     61.7±0.73µs   529.6 MB/sec
do_put_dictionary/dict/resend/8192x4      1.00     81.6±0.87µs  1601.9 MB/sec     1.01     82.7±0.98µs  1579.5 MB/sec
do_put_dictionary/dict/resend/8192x8      1.00    114.6±1.47µs     2.2 GB/sec     1.01    115.7±3.00µs     2.2 GB/sec
encode/fixed/65536x1                      1.00      9.8±0.02µs    49.7 GB/sec     1.02     10.0±0.01µs    48.9 GB/sec
encode/fixed/65536x4                      1.00     49.7±0.12µs    39.3 GB/sec     1.00     49.8±0.18µs    39.2 GB/sec
encode/fixed/65536x8                      1.00  1092.7±59.10µs     3.6 GB/sec     1.03  1124.9±42.18µs     3.5 GB/sec
encode/fixed/8192x1                       1.00      3.2±0.01µs    19.1 GB/sec     1.01      3.2±0.01µs    18.9 GB/sec
encode/fixed/8192x4                       1.01      8.7±0.02µs    28.0 GB/sec     1.00      8.7±0.02µs    28.2 GB/sec
encode/fixed/8192x8                       1.00     17.0±0.04µs    28.7 GB/sec     1.02     17.4±0.04µs    28.0 GB/sec
encode/nested/65536x1                     1.00     29.0±0.24µs    42.1 GB/sec     1.46     42.3±0.24µs    28.9 GB/sec
encode/nested/65536x4                     1.00  1423.8±17.18µs     3.4 GB/sec     1.15  1637.1±130.88µs     3.0 GB/sec
encode/nested/65536x8                     1.00      3.0±0.20ms     3.2 GB/sec     1.01      3.1±0.01ms     3.2 GB/sec
encode/nested/8192x1                      1.11      6.4±0.01µs    23.8 GB/sec     1.00      5.8±0.01µs    26.4 GB/sec
encode/nested/8192x4                      1.00     21.3±0.06µs    28.7 GB/sec     1.01     21.4±0.06µs    28.5 GB/sec
encode/nested/8192x8                      1.03     49.0±0.52µs    25.0 GB/sec     1.00     47.5±0.10µs    25.7 GB/sec
encode/variable/65536x1                   1.28     79.4±0.50µs    27.7 GB/sec     1.00     61.9±1.36µs    35.5 GB/sec
encode/variable/65536x4                   1.00      2.4±0.01ms     3.7 GB/sec     1.14      2.7±0.30ms     3.3 GB/sec
encode/variable/65536x8                   1.02      5.6±0.39ms     3.1 GB/sec     1.00      5.5±0.25ms     3.2 GB/sec
encode/variable/8192x1                    1.56     10.8±0.01µs    25.4 GB/sec     1.00      6.9±0.01µs    39.7 GB/sec
encode/variable/8192x4                    1.08     30.1±0.05µs    36.6 GB/sec     1.00     27.9±0.10µs    39.3 GB/sec
encode/variable/8192x8                    1.00     79.7±0.79µs    27.6 GB/sec     1.01     80.2±0.11µs    27.4 GB/sec
roundtrip/fixed/65536x1                   1.00    311.3±4.95µs  1606.7 MB/sec     1.01    314.8±5.47µs  1588.8 MB/sec
roundtrip/fixed/65536x4                   1.03  1247.4±88.52µs  1603.6 MB/sec     1.00  1211.3±75.23µs  1651.4 MB/sec
roundtrip/fixed/65536x8                   1.05      2.3±0.17ms  1751.3 MB/sec     1.00      2.2±0.02ms  1847.0 MB/sec
roundtrip/fixed/8192x1                    1.00     92.5±3.04µs   676.5 MB/sec     1.00     92.9±1.08µs   673.9 MB/sec
roundtrip/fixed/8192x4                    1.00    200.1±2.99µs  1251.3 MB/sec     1.00    199.7±2.10µs  1253.6 MB/sec
roundtrip/fixed/8192x8                    1.00    337.5±6.77µs  1483.4 MB/sec     1.02    344.0±6.94µs  1455.7 MB/sec
roundtrip/nested/65536x1                  1.00   872.7±64.94µs  1432.6 MB/sec     1.02   889.5±57.66µs  1405.4 MB/sec
roundtrip/nested/65536x4                  1.06      4.4±0.28ms  1144.9 MB/sec     1.00      4.1±0.12ms  1210.2 MB/sec
roundtrip/nested/65536x8                  1.00      8.7±0.37ms  1153.7 MB/sec     1.08      9.4±0.73ms  1066.4 MB/sec
roundtrip/nested/8192x1                   1.00    158.2±5.40µs   989.1 MB/sec     1.02    161.7±6.18µs   967.3 MB/sec
roundtrip/nested/8192x4                   1.00   481.9±23.76µs  1298.5 MB/sec     1.00   483.2±26.96µs  1295.1 MB/sec
roundtrip/nested/8192x8                   1.02   939.6±51.78µs  1332.1 MB/sec     1.00   923.8±53.19µs  1354.8 MB/sec
roundtrip/variable/65536x1                1.00  1250.9±53.93µs  1798.9 MB/sec     1.05  1318.5±87.29µs  1706.6 MB/sec
roundtrip/variable/65536x4                1.06      8.2±0.60ms  1101.2 MB/sec     1.00      7.7±0.35ms  1166.8 MB/sec
roundtrip/variable/65536x8                1.01     15.0±0.67ms  1202.7 MB/sec     1.00     14.8±0.82ms  1214.1 MB/sec
roundtrip/variable/8192x1                 1.00    207.9±6.60µs  1353.4 MB/sec     1.00    207.3±6.19µs  1357.7 MB/sec
roundtrip/variable/8192x4                 1.01   703.7±32.28µs  1599.8 MB/sec     1.00   694.8±24.01µs  1620.2 MB/sec
roundtrip/variable/8192x8                 1.00  1219.2±22.08µs  1846.6 MB/sec     1.06  1292.2±87.15µs  1742.3 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 910.2s
Peak memory 220.5 MiB
Avg memory 90.2 MiB
CPU user 915.3s
CPU sys 137.0s
Peak spill 0 B

branch

Metric Value
Wall time 940.2s
Peak memory 220.0 MiB
Avg memory 108.9 MiB
CPU user 952.5s
CPU sys 128.3s
Peak spill 0 B

File an issue against this benchmark runner

@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/re-use-allocations branch from e9f85aa to 9fadcd7 Compare June 27, 2026 15:44
@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

🤔 i've been thinking about changing the benchmarks to avoid having tokio polling be a factor. It may be more useful to benchmark the top level entry functions I.E

  • encode path : encode_batch()
  • decode path: extract_message()

directly
This would avoid the tokio run time playing such a large role in polling the streams, which I think is causing large variances between benchmarks. For example

decode/fixed/65536x8                      7.99      4.5±0.49ms     3.5 GB/sec     1.00   563.9±12.87µs    27.7 GB/sec
decode/nested/65536x8                     2.38     15.3±2.01ms     2.6 GB/sec     1.00      6.4±1.36ms     6.1 GB/sec

the decode path wasnt touched in this PR but its showing a drastic change

we have no control over the tokio run time so it makes sense to avoid it in the benchmarks if we can.
@alamb do you have any opinions on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants